-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
persist and recover shard states. #41
Conversation
Avoid using the BytesMount in tests, as it encodes the bytes as base64 in the URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff ! Just some nits.
- Needs some solid tests.
- I'm worried about the fsm Idempotency after resetting the state to
ShardStateNew
and ensuring we don't end up with more than one transient if we'd already fetched a copy before going down and restarting but I'll write a test for that.
return fmt.Errorf("failed to apply mount upgrader: %w", err) | ||
} | ||
|
||
s.indexed = ps.Indexed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think rather than keeping a track of this indexed
field across restarts/or even at all, we can always ask the IndexRepo
if it has the Index we need. That'll make it one less mutable state to update/track/keep in sync with the IndexRepo
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that simplification is worth it. We can use StatFullIndex
to check if the index exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -40,5 +41,5 @@ type Shard struct { | |||
wAcquire []*waiter | |||
wDestroy *waiter | |||
|
|||
refs uint32 // count of DAG accessors currently open | |||
refs uint32 // number of DAG accessors currently open |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would be nice to have a comment here about which of these fields are persisted and which aren't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add.
Closes #32.
Closes #24.
Closes #13.
This PR fleshes out another important part of the system: shard state persistence and resumption upon restart, and it also rewrites the mount registry/factory abstraction.
Persisting shard state
On every iteration of the event loop, we persist the current shard state. This is persisted in a
datastore.Datastore
provided by the user. We assume it to be namespaced.Shards are persisted under their shard key, and the state is serialised as JSON. There is unused code for switching to CBOR in the future, if we find that to be more performant.
We can put on extra smarts to avoid persisting when unnecessary (e.g. when the persisted entity would be identical than the last one written). We can easily do this because
PersistentShard
is a comparable struct, so we could retain the last writtenPersistentShard
in the shard state, calculate the new one, and skip the datastore put if they're identical. TODO in #44.We also need to clean up the serialization logic. I don't like
MarshalJSON
andUnmarshalJSON
doing this much.Resuming the DAG store
On start, we restore the state from the Datastore. We iterate and recover all shards, overwriting some interim states:
Mount registry and factory
I revisited the
mount.Registry
andmount.MountFactory
design. This has been simplified by removing the latter. Instead, we use aMount
instance as a template, and we clone it every time we instantiate a new mount of that type.This allows us to set "environmental" properties that get applied automatically to all mounts of that type, as for example a JSON-RPC endpoint in the Lotus mount:
An example of this is used in the tests with the new
mount.FSMount
, which I had to replacemount.BytesMount
with for testing, since we can't afford to serialize full CAR files into base64 when encoding URLs every time that we persist shards.Finally, the new registry design allows us to register the same mount type under many different schemes, with different templates. This is important if, for example, you have various
fs.FS
you want to serve from (for testing), or you want to cater to more than one Lotus deployment.Since we do type matching, you do need to create new types when doing that: